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APPROACHES  FOR  EMPIRICAL  BAYES  CONFIDENCE  INTERVALS 


Bradley  P.  Carlin  and  Alan  E.  Gelfand 

ABSTRACT 

Parametric  empirical  Bayes  methods  of  point  estimation  date  to  the  landmark  paper 
of  James  and  Stein  (1961).  Interval  estimation  through  parametric  empirical  Bayes 
techniques  has  a  somewhat  shorter  history,  which  is  summarized  in  the  recent  paper  of 
Laird  and  Louis  (1987).  In  the  exchangeable  case,  one  obtains  a  "naive"  EB  confidence 
interval  by  simply  taking  appropriate  percentiles  of  the  estimated  posterior  distribution 
of  the  parameter,  where  the  estimation  of  the  prior  parameters  ("hyperparameters")  is 
accomplished  through  the  marginal  distribution  of  the  data.  Unfortunately,  these  "na¬ 
ive"  intervals  tend  to  be  too  short,  since  they  fail  to  account  for  the  variability  in  the 
estimation  of  the  hyperparameters.  That  is,  they  don't  attain  the  desired  coverage 
probability  in  the  "EB"  sense  defined  in  Morris  ( 1983a, b).  They  also  provide  no  state¬ 
ment  of  conditional  calibration  (Rubin,  1984). 

-  Eft  this  paper  we  propose  a  conditional  bias  correction  method  for  developing  EB  in¬ 
tervals  which  corrects  these  deficiencies  in  the  naive  intervals.  As  an  alternative,  several 
authors  have  suggested  use  of  the  marginal  posterior  in  this  regard.  We  attempt  to 
clarify  its  role  in  achieving  EB  coverage.  Results  of  extensive  simulation  of  coverage 
probability  and  interval  length  for  these  approaches  are  presented  in  the  context  of  se¬ 
veral  illustrative  examples. 

KEY  WORDS:  Confidence  interval;  empirical  Bayes;  bias  correction;  parametric  boot¬ 
strap;  conditional  calibration. 
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1.  INTRODUCTION 


Consider  the  usual  exchangeable  Bayesian  formulation,  that  is,  given  0,  the  data 
Y:J,j  =  1,...  ,  n,  are  independent  having  probability  density  function  f(y\6,),  i  =  1,...  ,p, 
and  the  0,  s  are  i.i.d.  from  some  continuous  prior  distribution  having  density  n{6 1  >7)  over 
0.  Our  ensuing  development  assumes  0,  is  a  scalar;  however,  extension  to  0,  a  vector  is 
illustrated  in  Example  2.4.  We  shall  work  in  the  parametric  empirical  Bayes  (EB)  setting 
(Morris  1983a)  and  let  tj  index  the  members  of  the  family  n,  although  tj  could  be  viewed 
as  indexing  all  distributions,  producing  the  nonparamctric  empirical  Bayes  of  Robbins 
(1983).  By  construction,  the  F,  =  ( F,lf ...  ,  Y^)  are  marginally  independent  with  distrib¬ 
ution  m(y]r}),  although  within  i,  YtJ  and  YtJ.  are  not  independent.  The  joint  marginal 

P 

distribution  of  all  the  data,  Y  =  (K,, ...  ,  Yp)  is  thus  =  Um  (  Y,  \  q).  Finally,  let 

/(0,  \y„  r\)  denote  the  posterior  distribution  of  0,. 

In  the  fully  Bayesian  setting,  one  chooses  a  value  for  rj  (based  on  subjective  infor¬ 
mation  or  prior  knowledge)  and  then  bases  all  inference  about  0,  on  f{6,  \y„  rj).  Familiar 
confidence  intervals  for  0,  based  upon  this  posterior  distribution  include 

•  equal  tail,  where  we  take  the  upper  and  lower  a/2  points  of f{9,\y„  rj),  respectively, 
as  our  interval.  If  we  let  qx(y„  rj)  be  the  a'"  quantile  of /(0,  \y„  rj),  we  may  write  this 
interval  as 


(<7a/2  O7!-  >/)•  ?l-*/2  On  >?))•  0-1) 

•  highest  posterior  density  (see  Berger,  1985),  where  we  take  all  0,  e  S  such  that 
/(0,  |_y„  tj)  >  c(a)  and  P(9,  e  S)  =  I  —  a.  If  our  posterior  is  unimodal  we  obtain  an 
interval 


(?„•  Of  V),  Of  ?)).  +  a'*  =  a- 


(1.2) 
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In  the  EB  setting  we  view  y  as  unknown,  and  use  m(Y|??)  to  obtain  an  estimator 
rj{ Y),  EB  point  estimation  based  upon  the  resulting  "estimated  posterior, "/(0,  iy„  t\),  has 
been  well  discussed  (see  Berger,  1985).  Best  choices  of  r/  (e.g.  MLE,  UMVL'E,  moments 
estimator)  in  a  decision  theoretic  sense  usually  require  case  by  case  investigation.  This 
same  problem  arises  in  developing  EB  confidence  intervals.  The  "naive"  EB  confidence 
intervals  based  upon  f(9,  \y„  rj)  corresponding  to  (1.1)  and  (1.2)  are,  respectively, 

A  A 

and,  (1.3) 

(<7a-0i.  '?).<?,_***  0;.  V))-  0-4) 

These  intervals  arc  called  "naive"  because  they  ignore  randomness  in  rj.  While  relatively 
easy  to  compute,  they  are  often  too  short,  inappropriately  centered,  or  both.  More 
precisely,  for  9,  Morris  (1983a,b)  defines  an  EB  confidence  set  of  size  1  —  a  as  a  subset 
t.(Y)  of  0  such  that  Pn  (9,  e  r.(Y))  >  1  -  a  ,  where  the  probability  is  calculated  over  the 
joint  distribution  of  6,  and  Y.  This  definition  becomes  more  appealing  if  the  inequality 
is  replaced  by  approximate  equality.  Hence  we  shall  say  that  r,(Y)  is  an  unconditional 
1  —  a  EB  confidence  set  for  9,  if  and  only  if  for  each  rj, 

Pv(9iela(\))~l-a.  (1.5) 

Rubin  (1984)  has  observed  that  (1.5)  is  "a  fairly  weak  statement  in  the  absence  of 
statements  about  calibration  conditional  on  characteristics  of  the  data."  We  concur  and 
hence  modify  (1.5)  to  an  approximately  conditional  statement  given  a  suitable  summary 
of  the  data,  b(Y).  That  is,  t,(Y)  is  a  conditional  I  —  a  EB  confidence  set  for  9 ,  given 
b(Y)  if  and  only  if  for  each  rj  and  6(Y)  =  b  , 

V”.  e ',(Y)  1 5(Y)  =  (;)==  1-oc.  (1.6) 

The  naive  intervals  (1.3)  and  (1.4)  generally  fail  to  satisfy  both  (1.5)  and  (1.6).  In 
Section  2,  we  introduce  a  method  for  correcting  the  naive  interval  (1.3)  to  meet  (1.6) 
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where  correction  is  made  conditionally  on  b( Y)  =  Y„  the  sufficient  statistic  for  the 
posterior.  Theoretical  results  and  empirical  work  show,  in  response  to  Rubin,  that 
roughly  nominal  coverage  conditionally  on  Y,  ensues.  This  in  turn  insures  that  uncon¬ 
ditional  nominal  coverage  (1.5)  will  be  roughly  achieved.  The  method  is  applied  to  se¬ 
veral  examples,  including  simultaneous  simple  linear  regression. 

Several  authors  (Deely  and  Lindley  1981,  Rubin  1982,  Morris  1983a,b,  1987,  Laird 
and  Louis  1987,  and  Pepple  1988)  have  employed  a  hyperprior  on  to  adjust  confidence 

A 

intervals  based  upon  the  estimated  posterior  to  reflect  the  uncertainty  in  rj.  The  pro¬ 
posal  is  to  use  corresponding  quantiles  of  the  resulting  "marginal  posterior"  in  place  of 
those  of  the  estimated  posterior.  This  additional  integration  (mixing)  produces  a  dis¬ 
tribution  which  has  more  spread  than  the  estimated  posterior,  hence  produces  intervals 
longer  than  the  naive  ones.  In  Section  3  we  explore  the  link  between  using  the  marginal 
posterior  and  satisfying  (1.5).  In  Section  4,  we  present  simula.ion  results  of  coverage 
probabilities  and  interval  lengths  for  these  approaches  in  the  context  of  the  aforemen¬ 
tioned  examples.  We  summarize  our  findings  in  Section  5. 

2.  THE  BIAS  CORRECTED  NAIVE  APPROACH 

Efron  (1987)  proposed  a  general  framework  for  correcting  the  bias  in  naive  EB  in¬ 
tervals.  In  the  exchangeable  case,  a  direct  conditional  bias  correction  may  be  developed 
as  follows.  We  consider  confidence  sets  for  0,  given  b{ Y)  =  Yt.  Taking  i  =  l  w.l.o.g., 
recall  that  q,[yu  q)  is  such  that 

P{9 1  <  <7a(yi1'7)l#i~/(0i  l>’i-  v))  =  *•  (2-1) 


Define 


>iy,  n,y\,  a)  =  P(0,  ^<7a(Fi.  >r)l0i~/(0i  \yuv)) 


(2.2) 


and  finally 
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(2.3) 


R{y,y],  ('(»/.  «)) 

^  i>i.  *f 

where  the  expectation  is  taken  over  g{ri  |y,,  rj),  a  density  with  respect  to  Lebesgue  meas¬ 
ure.  Note  that  R  depends  upon  the  dimensionality  p  of  the  problem  as  well,  but  this  is 
suppressed.  Since  (2.3)  need  not  be  close  to  a  ,  we  can  see  vvhy  (1.3)  and  (1.4)  usually 
fail  to  meet  (1.6)  for  b{ Y)  =  Y,.  Suppose  we  solve 

R(V,ylta')  =  a  (2.4) 

for  a'.  This  a'  would  conditionally  "correct  the  bias"  in  using  rj  in  our  naive  procedure. 
Applying  (2.4)  to  each  tail  would  produce  intervals  which  meet  (1.6)  exactly.  But  of 
course  we  can't  solve  (2.4)  since  r\  is  unknown.  Instead,  we  propose  to  solve 

R{y,y],a')  =  a  (2.5) 

to  obtain  a'  =  a).  Then  we  take  as  our  bias  corrected  naive  EB  confidence  inter¬ 

val  (1.3)  (or  (1,4))  with  "a"  replaced  by  "  a' ".  In  this  paper  we  confine  ourselves  to  the 
case  where  the  density  giyly^rj)  is  available  in  closed  form.  Calculating  the  left  hand 
side  of  (2.5)  in  this  case  is  called  a  conditional  "parametric  bootstrap"  (Laird  and  Louis, 

A 

1987).  When  giyly^y)  is  not  tractable  a  conditional  'Type  III  parametric  bootstrap" 
(terminology  again  due  to  Laird  and  Louis;  see  also  Section  3  below)  estimator  of  the 
left  hand  side  of  (2.4)  may  be  used  in  (2.5).  We  detail  such  estimation  in  a  subsequent 
paper.  Note  that  to  effect  unconditional  bias  correction  (1.5)  we  would  replace  (2.3)  by 
R(rj,  a )  =  £,  (r{tj,  Tj,yx,  a))  ,  and  solve  R(t],  a')  =  a  . 

<r,yi  I* 

Under  mild  regularity  conditions,  our  procedure  gives  a  unique  confidence  interval. 
Lemma  2.1.  If  dr/da  exists,  then  the  bias  corrected  confidence  interval  is  unique. 
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Proof.  From  (2.1)  we  see  q,(r\,yx)  |  a,  hence  r(?f,  T?.ylt  a)  |  a  .  But  dR/da  = 
(d/da)fr(rt,  ri,ylt  a)  dG(ri\yur])  =  jdr(>/,  rj,  ylt  a)/<5a  dG(r]  |y„  r\)  >  0.  Thus  R  fa,  and  so 
(2.5)  has  a  unique  solution. 

Conditional  coverage  given  K,  is  consistent  with  the  Bayesian  view  given  in  (1.1)  and 
(1.2),  since  in  the  exchangeable  case  K,  is  sufficient  for  0,  in  the  posterior  family,  i.e., 
/(0,  |  Y,  it)  =/(0,  |  Ylt  ti).  Typically  when  «,  >  1  we  condition  on  a  minimal  sufficient 
function  of  y,  (see  Examples  2.3,  2.4  below).  Moreover,  Theorem  2.1  below  and  our 
empirical  work,  show  that  our  conditional  bias  correction  approach  for  suitable  rj  in  fact 
roughly  achieves  (1.6)  with  b( Y)  =  K,. 

A  A 

Implementation  of  (2.3)  -  (2.5)  may  be  easier  if  rj  is  independent  of  Y„  e.g.,  if  q  is 
based  on  Y2, ... ,  Y The  integration  in  (2.3)  is  now  over  the  usually  more  accessible 
distribution  of  rj  I  r\,  but  correction  is  still  conditional  given  f,  (see  Case  II  of  Section  4). 

Again  for  0,  scalar,  suppose  there  exists  a  function  of  0,  andy,  monotone  in  0,  for 
fixed  y,  such  that  the  conditional  distribution  of  £,  given  y,  is  the  same  as  the  uncondi¬ 
tional  distribution  of  Then  <J,  may  be  called  a  pivotal  (see  Cox  and  Ilinklcy,  1974). 
Bias  correction  of  (1.3)  or  (1.4)  is  equivalent  to  bias  correction  of  the  corresponding 
quantiles  of  £,'s  distribution.  Expressions  (2.1)  and  (2.2)  may  now  be  replaced  by  cor¬ 
responding  ones  with  _>>,  deleted. 

If  unconditional  EB  coverage  is  the  objective,  the  pivotal  is  helpful.  We  may  inte- 

■  *  A 

grate  trivially  over  Y^\r\,r\  and  then  numerically  over  rjitj.  A  corresponding  version  of 
Lemma  2.1  holds  and  a  corresponding  version  of  Theorem  2.1  will  go  through  if 
and  rj\rj  are  stochastically  ordered  in  r\.  Bounds  on  the  unconditional  expected  tail 
probability  result.  To  illustrate,  we  turn  to  Examples  2.1  and  2.2  where  a  pivotal  is 
available  enabling  simple  bias  correction  to  satisfy  (1.5). 

Example  2.1.  Exponential/Inverse  Gamma  (IG).  First  suppose  n,  —  1  for  all /.  Let 

ttd 

Yt, ...  ,  y,  ~  Exponential^,), /=  1, ...  ,p  independent,  and  let  0,, ...  ,  0,~  b), 
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rj,b>  0.  Thus  f(y,  |  d)  =  dj'  exp(  -yjd),  y,  >  0,  0,  >  0,  /  =  1, ...  ,p.  and  n{6,\rj,b)  = 
exp(  —lld,b)l{r(rt)b’’0,r')  ,ij,b>  0,  i  —  1,...  ,  p.  Hence  the  marginal  distribution  of  Y,  is 

m  0, 1 V,  b)  =  rib  I  (by i  +  1  )'7+l ,  yt  >  0  (2.6) 


and  the  posterior  distribution  of  0,  is 


f(di\yi,v,b)  = 


exp(-| >,+  1  lb)ie()  C vt+  1 

n*  + 1)  er2 


(2.7) 


that  is,  (2.7)  is  Inverse  Gamma(»/  +1,  (y,  +  1  lb)-').  Taking  b  =  1,  from  (2.7)  we  have  the 
pivotal  =  0,/(y,  +  1)~  IG(^  +  1,1).  From  (2.6)  the  MLE  of  rj  is  >7  =  />/Ilog(y,  +  1)  and 
(2.2)  becomes  /-(>/,>/,  a)  =  1  -  *  (1  —  a))  where  D*  is  the  c.d  f.  with  k  de- 

grees  of  freedom,  k  not  necessarily  an  integer.  For  unconditional  coverage  we  need  the 

A  A 

distribution  of  tj\ti,  which  is  lG(p,  ll(rjp))  .  We  solve  R(rj,  a')  =  a  using  a  one¬ 
dimensional  numerical  integration  (transforming  the  1G  to  the  interval  (0,1)  and  using 
16-point  Gaussian  integration  —  see  Abramowitz  and  Stegun  1967)  with  one  rootfinder 
(using  false  position).  As  an  illustration,  Figure  1  plots  a'(^,a)  versus  r,  for  nominal 
upper  and  lower  tail  areas  a  =  .01,  .025,  .05,  .1,  with  p  =  10. 

(Note:  Insert  Figure  1  about  here) 

For  conditional  coverage  we  need  the  conditional  distribution  of  Ti\yx,rj.  This  may  be 

A  A 

obtained  by  routine  transformation  after  noting  that,  given  r\,  r\  and  a  —  rj  log(T,  +  \)jp 
are  independent,  the  latter  having  a  Beta(l,  p-1)  distribution.  We  omit  the  details. 

Example  2.2.  We  can  extend  Example  2.1  to  the  Gamma/IG  problem,  i.e., 

md 

Y,- Gammafv,,  0)  where  v,  known  and  not  necessarily  all  equal  (for  example,  v,  might 

nd 

be  n,)  and  B,~  IG(>/,  b),  i=  1, ...  ,  p.  Again  we  take  b=  1.  (Note  that  this  case  includes 
the  x2  scale  problem.)  One  can  show  that  T,|  ri  ~  T(v,  +  i/)/(T(v,)r(?7))  .y*.  'I(yi  +  l)*>*\  a 
Pearson  Type  VI  distribution  (Johnson  and  Kotz,  1970).  Again  <5,  =  0, /(y,  +  1)  is  a  piv¬ 
otal,  which  is  now  distributed  as  IG(v,  +  rj,  1).  While  the  MLE  rj  is  no  longer  available 
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in  closed  form,  we  can  show  that  T(r\)  =  Ilog(j;  +  1)  is  decreasing  in  q,  and  thus  we  can 
use  T(rj)  to  implement  bias  correction. 

Remark  I.  With  a  pivotal,  unconditional  correction  will  automatically  conditionally 
bias  correct  given  any  T( Y)  independent  of  rj,  since  integration  over  y  |  T,  rj  is  the  same 
as  over  rj\rj.  This  means  that  if  rj  is  chosen  independent  of  Yu  unconditional  bias  cor¬ 
rection  will  achieve  conditional  bias  correction  given  T,  (see  Example  2.4).  If  r\  and  T, 
are  not  independent,  the  pivotal  is  not  helpful  since  the  integration  in  (2.3)  is  still  with 
respect  to  g(rj  |y,,  rj)  even  if  r  is  free  ofy,. 

Examples  2.3  and  2.4  offer  a  class  of  problems  where  (2.4)  is  free  of  17,  as  well  as>',. 
This  means  a'  can  be  obtained  from  a  without  having  to  estimate  rj,  and  nominal  un¬ 
conditional  coverage  is  exactly  achieved.  Empirical  work  in  Section  4  shows  that  such 
unconditional  intervals  demonstrate  good  conditional  behavior  given  T,  as  well. 

Example  2.3.  The  normal,  normal  problem  where  we  assume  n,  =  1  for  all  i  .  Thus 

md  nd 

we  have  Y,-  A'(0„  a2),  d,  -  .V(m,  r2),  i  =  1  Let  a2  be  known  and  =  1  w.l.o.g.  Then 

/(0, 1^,  n)  =  N{Bn  +  (1  -%lf  1  -  B)  (2.8) 

where  B  —  1/(1  +  z2)  .  If  we  assume  z2  known  =  0,  —  (1  —  5)T,  is  a  pivotal  distributed 

as  i \(Bp,  1  -  B).  If  qt(p)  denotes  the  ath  quantile  of  this  distribution, 

q,(p)  —  Bn  +  J{\  -  B)  <D~'(a)  and  (2.2)  becomes 

Km,  M,  a)  =  <Wm  -  m)/v"(1  -  B)  +  d>_,(a)}  (2.9) 

where  m  =  K.  For  EB  coverage  we  integrate  (2.9)  with  respect  to  the  distribution  of 

mIm  which  is  ;Y(m,  1 1  Bp).  Cleaily  the  resulting  R  is  free  of  p;ol'  depends  only  on  a  . 
Hence  exact  unconditional  bias  correction  can  be  achieved  and  exact  EB  coverage  at¬ 
tained  (see  Cox,  1975,  Section  6).  Conditional  bias  correction  requires  integration  with 
respect  to  the  distribution  of  p\ yx,  m  which  is  N{{p{p  -  I)  +  Yt)lp,(p  -  I )/Bp2). 
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Alternatively,  if  we  assume  p.  known  but  r2  (hence  B)  unknown,  then  no  pivotal  from 
(2.S)  is  possible.  For  conditional  bias  correction,  assuming  B  is  a  function  of 
T=  Z{Y,  —  nf  we  need  the  distribution  of  T\  F,,  B,  which  is  immediate  from  the  fact  that 
T  -  (  F,  -  n)2  Ij,’,,  B  ~  B  1  xl-i  (see  Case  I  of  Section  4  below). 

If  both  n  and  r 2  are  assumed  unknown,  conditional  bias  correction  requires  the  joint 
distribution  of  F .  I(  F,  -  F)2|  Y,,p,  B  ,  which  can  be  attacked  through  a  Helmert  trans¬ 
formation  on  Y.  If  Y  and  I(F,  —  F)2  are  based  only  on  F:, ...  ,  Yp,  matters  are  simpler. 

Example  2.4.  The  previous  example  can  be  extended  to  the  case  of  p  simultaneous 
regressions.  Let  F,  1 6,  -  X{X8  ,  a2),  /  =  I, ...  >p  where  F,  is  n,  x  1  ,  X,  is  n,  x  k  full  rank, 
and  <r2  is  assumed  known.  In  practice  we  would  use  an  independent  estimator  of  a 2 
based  upon  F,  in  what  follows.  When  n,  is  at  least  moderate  there  is  evidence  (Lawless, 
1981  pp  463-4)  that  the  resulting  coverage  will  differ  little  from  that  with  a2  known. 

nd 

Suppose  8 ,  ~  Mm9,  r2/)  .  This  prior  is  perhaps  most  reasonable  if  the  columns  of  the  X, 
are  centered  and  scaled.  For  convenience  we  in  fact  assume  that  XJX,  =  /»„*.  Routine 
calculation  shows  that  8.  \Y,,pe,  t2~  X(B,pg  +  (1  -  B,)XJY„  a2{\  -  B,)I)  where 
B,  =  a }l(a2  -I-  r2),  while  Y  I  w9,  r2~  X(Xpg,  Iy)  where  Xr  =  [XJ, ...  ,  XJ)  and  Lr  is  block  diag¬ 
onal  with  i1"  block  being  BJa2  •  If  r2  is  assumed  known  then  £,  =  0,  —  (1  -  B^)X\Yx 

is  a  pivotal  having  distribution  S(B^p„a\{\  —  B^)f)  while  pe  —  {X'T.y' X)~' XTLy' Y 
~  .V(/i9,  p  '[)■  The  independence  of  the  coordinates  of  combined  with  the  argument 
at  the  beginning  of  Example  2.3  enables  construction  of  a  simultaneous  k-dimensional 
confidence  rectangle  attaining  exactly  nominal  EB  coverage.  A  simultaneous  EB  confi¬ 
dence  ellipsoid  can  be  developed  by  noting  that  ^^,~<t2(I  —  5,)^2  ,  where 

/,  =  {B\plp0)K2n}(\ -  /?,)).  and  then  bias  correcting  r(l„>i„a)  = 
^  ‘f.Ui)  Ursi  ~  "jf  1  ~  Conditional  EB  coverage  could  be  attempted 

through  the  distribution  of  p„\  F,  .  However,  if  p.g  is  calculated  deleting  F,,  then  by  Re¬ 
mark  I  above,  exact  conditional  EB  coverage  given  F,  can  be  achieved.  If  r2  is  assumed 
unknown  matters  become  much  more  complicated.  No  pivotal  exists,  pe  and  t2  will  be 
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unavailable  in  closed  form  unless  all  a)  are  equal,  and  the  conditional  distribution  of 
uas  t2|  y,  is  intractable.  A  bootstrapping  method  will  be  the  only  feasible  approach. 


Does  the  conditional  bias  correction  method  actually  produce  approximate  condi¬ 
tional  coverage  given  ?'?  Again  taking  i  =  1  to  answer  the  question,  we  need  to  see  how 
close  the  expectation 


En\ 1  U,?)) 

A  A 

=  £’lv  r{rj,rj,yucc'{rj,y^ci)) 

*7*  rl- *T 


(2.10) 


is  to  a  Under  usual  conditions,  since  0,  is  continuous,  if  77  is  a  consistent  estimator  of 
77  (as  p  tends  to  infinity)  then  (2.10)  will  converge  to  a.  For  fixed  p,  while  exact  evalu¬ 
ation  of  (2.10)  is  not  possible,  Theorem  2.1  is  encouraging  since  it  shows  that  in  many 
cases  (2.10)  falls  in  an  interval  containing  a. 


THEOREM  2.1.  Suppose  both  f(d,\yuti)  and  g(v\y\,*l)  are  stochastically  ordered 
families  in  77  for  fixed  yv  Then  the  conditional  expected  'tail  probability,"  (2.10)  is 
bounded  above  by  a  +  max(/,,  /2)  and  below  by  a  +  min(/,,  /2)  where 


A  =  f*  [a'(?/,y,,  a)  -  yua'{ij,yu  a))]  g{rj\y],  r\)dri  ,  and 

1  >v 

A  A  A  A 

h  -  j*  [*'(»?,  J'i.  «)  -  r{r\,  rj,y ,,  <x'(rj,yu  a))]  g  (rj  \yur\)d}] 
v  <v 


Proof.  We  prove  the  case  where  both  /(0,  | >>,,  7)  and  g-  (77  \yx,  77)  arc  stochastically  in¬ 
creasing  in  77,  with  the  proof  for  the  other  cases  following  similarly.  Thus  qfyu  77)  f  77  for 
fixed  g,,  and  in  fact  from  (2.2),  r(r\,  rj,yt,  a)  T  77  while  r(rj,  77 , >»i,  a)  j,  77  .  Since  g  (77  ly,,  77)  is 
stochastically  increasing  in  77, 

A 

Riv,y\,  *)  =  rirj,  77,  j/,,  i)  T  v  (2.11) 

v 1  y].n 


(see  e  g.  Lemma  2,  Chapter  3,  Lehmann  1986).  Also,  the  mild  regularity  condition  of 
Lemma  2.1  insures  that  R{rj,yu  a)  |  a. 


to 


Next,  let  rjl  <  rj2,  and  consider  for  a  specified  a0,  »'(»/„ y{,  a0)  and  a'(rj2, y„  x0)  arising 
from  Rir/,, y„  a')  =  a0  and  R(rj2,y\,  a')  =  a0  ,  respectively.  By  (2.1 1),  /?(>/,, yit  a)  lies  below 
Riih.y i.  a)  whence  a'^,,  v|t  a)  >  a a),  that  is.  a)  |  rj  .  Thus  if  rj  <  tj, 

AAA  A  A  A 

r{V.  *l,yi.  *(y,y\,  a))  <  r{rj,  ij,yi,z'(ri,yu  a))  <  r(>/,  i/.y,.  a))  (2.12) 

* 

In  addition,  the  inequalities  in  (2.12)  are  reversed  i irj<rj.  The  left  hand  side  of  (2. 12) 
equals  a'(»|,j/1,  at)  and  thus  is  decreasing  in  r\\  the  right  hand  side  of  (2.12)  is  increasing 

*  A  A 

m  rj.  However,  we  cannot  conclude  monotonicity  for  r{rj,  »?,>>,,  a.'[t),yu  a)).  Figure  2  of¬ 
fers  a  generic  view  of  the  situation. 

(Note:  Insert  Figure  2  about  here) 

Finally  since  £»  r(ri,  ri,yt,  a’{r],yu  a))  =  R(y,yu  a'(q,y„  a))  =  a  by  definition,  the 
bounds  in  the  theorem  follow. 

Remark  2.  From  Figure  2  we  see  that  /, .  I2  <  0,  whence  (2.10)  falls  in  an  interval 
containing  a 

Remark  3  The  fact  that  F,  (or  a  function  of  F,)  enters  directly  into  the  posterior 
(hence  into  all  of  our  subsequent  expressions)  makes  qualitative  examination  of  condi¬ 
tional  coverage  of  our  bias  correction  method  given  F,  straightforward.  Analytic  ex¬ 
amination  of  conditional  coverage  given  other  characteristics  of  the  data  does  not  seem 
promising,  except  in  cases  where  a  pivotal  exists,  as  in  Remark  1. 

If  rj  is  of  dimension  k  then  calculation  of  R  requires  a  k-dimensional  numerical  inte¬ 
gration  and  the  solution  of  (2.5)  requires  a  rootfinding  algorithm.  A  possible  alternative 
to  the  numerical  integiation  is  to  utilize  the  approach  of  Cox  (1975)  who  suggests  ex¬ 
pansion  of  r{ti,ti,yt,a)  in  r\  about  rj,  i.c.,  r(>/,  r},yu  a)*:  r(rj,  ri,yx,  a)  +  (>j  - 
+  \j2(rj  -  r\)r H ,(r\){ri  -  rj)  where  (V,(j/)),  =  (drjdrj)  |,  and  (//,(>/)),,  =  (3V/e?Jf,  <?>/,)  I,  whence 

R(n.  vx,  4-  E-  Jtj  -  >i)TVr(ri)  +  1/2  tr[HrM .  £*  fa  -  *)fa  -  7)r]  (2.13) 

'M'l.’T  rj\  Y^,  rj 

II 


Denoting  the  right  hand  side  of  (2.13)  by  R'{r],ylx),  analogous  to  (2.5)  we  may  solve 
R'(ri,y ,,a')  =  x  for  a'.  Note  that  even  if  g(>i  |  v,,  77)  is  a  standard  distribution  so  that 
E~  (77)  and  are  readilv  available,  (2.13)  still  requires  the  evaluation  of  2k  +  (*) 

numerical  derivatives. 

3.  THE  MARGINAL  POSTERIOR  APPROACH 

In  the  PEB  setting  several  authors  have  attempted  to  account  for  the  variation  in 
estimating  the  hyperparameter  77  by  introducing  a  hyperprior  distribution  on  77.  Corre¬ 
sponding  quantiles  of  the  resulting  "marginal  posterior"  are  used  in  place  of  those  of  the 
estimated  posterior.  As  a  mixture  of  posteriors,  this  marginal  posterior  typically  has 
more  spread  than  the  estimated  posterior,  so  that  intervals  longer  than  the  naive  ones 
result.  This  section  is  intended  to  illuminate  this  marginal  posterior  approach. 

To  formalize  the  setup  we  again  confine  ourselves  to  the  exchangeable  case  using  the 
notation  of  Section  1.  Suppose  rj{ Y)  is  an  estimator  of  77  which  is  sufficient  for  the 
marginal  family  «i(Y|^)  and  has  density  0(77(77)  with  respect  to  Lcbesgue  measure.  Let 
z(ff)  be  a  continuous  hyperprior  on  77,  which  induces  the  conditional  distribution 

<\  A 

/if  >7 1 77)  oc  p(rj  1 77) .  7(77)  ,  which  m  turn  induces  the  "marginal  posterior"  for  9„ 

4(0<U>  v)  =  y)K>i  I n)dt\  (3.1) 

We  subscript  l  to  indicate  which  mixing  distribution  was  used  with  the  posterior.  The 
naive  intervals  (1.3)  and  (1.4)  would  be  replaced  with  corresponding  lower  and  upper 
points  of  lh.  Hence  coverage  in  the  sense  of  (1.5)  or  (1.6)  will  vary  with  the  specification 
of  t  ,  or  equivalently,  h.  This  pure  Bayesian  approach  is  less  targeted  at  achieving 
specified  EB  coverage  than  that  of  Section  2.  For  example,  there  is  no  obvious  re¬ 
lationship  between  using  a  vague  hyperprior  and  achieving  nominal  EB  coverage 
through  the  resulting  (3.1).  In  fact  Laird  and  Louis  (1987)  were  empirically  successful 
in  the  normal.' normal  problem  (Example  2.3)  with  known  prior  mean  and  unknown  prior 
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variance  using  /p  (i.e.,  mixing  with  respect  to  p(r]  |  >7),  the  sampling  density  with  rj  and  r\ 
exchanged).  The  key  issue  (a  non-Bayesian  one)  concerns  the  existence  and  nature  of 
an  h  which  will  be  successful  in  achieving  nominal  EB  coverage.  (We  defer  a  rough 
discussion  of  this  issue  until  the  end  of  the  section.)  For  instance,  if  the  naive  EB  con¬ 
fidence  interval  is  too  long  (as  in  Case  I  of  Section  4)  this  approach  seems  doomed  to 
failure;  we  need  to  correct,  not  lengthen. 

When  p  is  available  in  closed  form  the  numerical  integration  in  (3.1)  can  be  carried 
out  directly  (Deely  and  Lindley  1981,  Rubin  1982).  Moms  (1987)  suggests  approxi¬ 
mating  4  using  the  member  of  the  posterior  family  /whose  first  two  moments  agree  with 
those  of  4.  Laird  and  Louis  (1987)  suggest  approximating  (3.1)  by  the  use  of  a  Type  111 

A  A 

parametnc  bootstrap.  That  is,  given  rj  ,  draw  d'  from  n(9 1  rj).  Then  draw' 
>7  from /(yl0‘).  and  finally  calculate  rj‘  =  r\{ Y‘)  .  Repeating  this  process  N  times,  we 
obtain  r\),  y=l,  .  ,  N  distributed  as  p(  •  |  rj)  .  The  discrete  mixture  distribution 

7/  /  .V.  (3.2) 

is  taken  as  the  estimator  of  (3.1)  and  quantiles  of  (3.2),  obtained  by  a  rootfinder,  are 
used  instead  of  those  of  (3.1). 

Note  that  (3.2)  is  an  unbiased  estimator  of  /,  and  converges  almost  surely  to  lp  as 
N  -*  00,  leading  to  criticism  of  its  use  in  the  comments  following  the  Laird  and  Louis 
paper.  But  if  the  objective  is  EB  coverage,  /,  (or  an  estimate  of  it,  like  (3.3))  may  be  as 

A 

good  as  4.  An  important  point  is  that  since  p  (hence  /,)  changes  as  y  changes,  the  per¬ 
formance  of  the  Laird  and  Louis  approach  can  be  quite  sensitive  to  the  choice  of  r\  (see 
Example  3.2  and  Table  1  below).  The  empirical  success  of  (3.2)  suggests  that  for  the 
examples  to  which  it  has  been  applied,  with  a  good  choice  of  rj,  p(  •  |  tj)  is  a  good  choice 
of  h.  For  any  1-1  onto  transformation  of  r\  given  rj  ,  s»(>7),  having  density  1//  ,  the  Type 

N 

111  parametric  bootstrap  enables  estimation  of  4  by  Z^(0,  ]y„  s  Tv’))  /  N  analogous  to 
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(3.2).  There  may  exist  a  choice  of  •)  such  that  ip  "matches"  h  ,  i.e.  lr  =  lh.  This  ex¬ 
tension  is  attractive  in  thm,  Like  (3.2),  it  does  not  require  that  p  be  given  in  closed  form. 

If  p  is  available  in  closed  form  then  for  any  r  the  Type  III  parametric  bootstrap 
provides  an  importance  sampling  Monte  Carlo  integration  (Hammersiey  and 
Handscomb  1964,  Gewcke  1988)  of  (3.1)  of  the  form 

.^/(Oi\yi,Vj)Hvj) 

—  ~  .  (3-3) 

where  k(»)  =  p(p\  •  )t(  •  )/p(  •  |i/).  Note  that  the  standardizing  constant  for  h  is  not 
required.  Implementation  of  the  marginal  posterior  approach  for  a  specified  t  in  the 
absence  of  a  closed  form  for  p  is  unclear.  We  consider  eariler  examples  in  this  context. 

Example  3.1.  Consider  the  normal/normal  example  2.3.  Assume  p  unknown  but  B 
known.  The  sampling  distribution  p(p|p)  is  ;V(p ,  \j{Bp)).  For  a  flat  hyperprior 
r  ,  h(p\p)  is  ;V(p  ,  1  j{Bp)).  Hence  ip  =  h  for  s~(p)  =  p  (Laird  and  Louis,  Theorem  I).  If 
we  assume  B  unknown  as  well,  Theorem  2  of  Laird  and  Louis  shows  that  no  choice  of 
s  a  will  produce  ip  —  h. 

Example  3.2.  Consider  again  the  exponential/invcrse  gamma  example  2.1.  Recall 
that  the  sampling  distribution  p{p\p)  is  lG{p,  1  Hyp))-  Then  the  hyperprior  associated 
with  /,  is  neither  simple  nor  natural.  Under  the  flat  hyperprior  t x(p)  =  1,  p  >  0,  h^rj  I  r\) 
is  Gamma(p  +  \,r\!p),  and  there  is  no  obvious  choice  of  j-  having  distribution  hx,  but 
we  can  use  (3.3)  to  "match"  (3.1).  Under  the  hyperprior  r2{rf)  -  rr\ rj  >  0,  h2(rj\rj)  is 
Gamma (p,  pip),  ind  s  *(p)  =  p2jp  docs  have  density  exactly  h2.  Pcpplc  ( 1988)  places  a  flat 
hyperprior  on  l  Ip  but  then  approximates  the  resulting  marginal  posterior  by  a  gamma 
distribution  whose  first  two  moments  agree  with  those  of  the  exact  lh  . 

We  return  to  the  question  of  when  lh  may  be  expected  to  give  approximate  nominal 
EB  coverage.  For  any  marginal  posterior  (such  as  those  in  (3.  l)-(3.3)),  let  Cp(Y„  p)  be 
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/V 

a  1  -  a  posterior  (Bayes)  credible  set  for  9„  i.e.  Pj  ^  «(0,  e  QX  Y„  ??))  =  1  —  a  .  Let 
1(9 „  Y„  ri)  =  1  if  (9„  Y„  rj)  are  such  that  9,  e  C<°(  Y„  rj),  and  0  otherwise  Then  provided  the 
distribution  of  rj\y,  is  proper,  E,\,LP,(9,  e  QX  Y„  q)  |y,)]  =  £  I(9„y,  y) 

“  6  *  £;,„(I  - a)  = 1  - a  • 

Thus  for  any  /A  such  that  the  distribution  of  y\y,  is  proper,  on  average  (over 
QX  T,.  v)  meets  (1.6);  it  provides  conditional  EB  coverage  given  Y,.  A  good  lh  however 
requires  that  P,(0,  e  0;\Y„  ??)l^,)~  1  —  a  for  each  rj.  To  address  this  more  demanding  is¬ 
sue,  consider  the  following  rough  argument  (motivated  by  Laird,  1988),  which  provides 
insight  in  the  case  where  a  pivotal  exists.  Dropping  Y,  in  (3.1)  and  replacing  9,  by  d,  let 
q^\r\)  denote  the  a1"  quantile  of  lh(£,  \  y),  and  let  q,(y)  denote  the  a,h  quantile  of  the  true 
distribution  of  £,\y  (obtained  from  f(9i\yi,y)).  Defining  ^(y,  y,  a)  =  /*{  („(£,  <  , 

A  . 

we  show  when  the  expectation  of  r"*  over  r\  \  r\  will  fall  in  an  interval  containing  a.  Since 
mixing  by  h  will  typically  "spread  out"  the  posterior  (hence  the  distribution  of  £,\y),  we 
assume  that  for  a.  small  (near  0),  c^\y)  <  q,(y)  while  for  a  large  (near  1),  ^(y)  >  qAh)  ■ 

A 

Suppose  additionally  that  h  is  such  that  for  a  small  ***  is  approximately  convex  in  y  while 

A 

for  a  large  is  approximately  concave  in  y.  (We  argue  when  this  might  be  the  case 
below.)  Finally  let  y  be  unbiased  for  rj.  Then  for  a  small, 

r{h)(*l,  V,  a)<.  E«  r(h\y,y,  a)<;£»  r(y,  y,  a)  a  R(y,  a)  (3.5) 

rj\rj  rj\rj 

where  r  and  R  are  as  in  (2.2)  and  (2.3)  with^,  deleted  because  of  the  pivotal.  But  also 

r(h)(y,  y,  a)  <  r(  y,  y,a)  =  a<  R(y,  a)  (3.6) 

where  the  last  inequality  in  (3.6)  usually  holds  because,  when  a  is  small,  a'  such  that 
a  =  R(y,  a')  is  usually  less  than  a  and  R(y,  a)|a  by  Lemma  2.1.  Together,  (3.5)  and  (3.6) 
suggest  that  for  a  small  will  be  close  to  a.  For  a  large  our  assumptions  reverse 

the  inequalities  in  (3.5)  and  3.6)  and  thus  a  similar  conclusion  holds. 
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To  return  to  the  question  of  the  convexity  or  concavity  of  r*'0,  suppose  the  distrib¬ 
ution  of  ^  |  rj  is  unimodal.  Then  the  c.d.f.  of  £,  |  rj  will  be  an  increasing  convex  (concave) 
function  below  (above)  the  mode.  Hence  if  h  is  such  that  is  approximately  convex 
(concave)  in  ij  for  a  small  (large)  then  will  be  approximately  convex  (concave)  in  rj 
for  a  small  (large).  We  recall  that  under  families  stochastically  ordered  in  r\,  q^:)(ri)  will 
be  monotone  in  rj.  Using  the  definition  of  q^{ti),  implicit  differentiation  enables  an  ex¬ 
pression  for  its  second  derivative.  We  omit  details. 

4.  SIMULATED  COVERAGE  PROBABILITIES  AND  INTERVAL  LENGTHS 

In  this  section  we  present  the  results  of  simulation  studies  comparing  the  methods 
discussed  in  the  previous  two  sections.  We  first  offer  results  for  the  bias  corrected  naive 
(BCN)  method,  then  some  limited  results  for  the  marginal  posterior  method.  Finally 
we  give  the  unconditional  EB  coverages  for  both  methods  in  a  unifying  example. 

I.  First,  we  illustrate  the  bias  corrected  naive  method's  ability  to  achieve  conditional 
EB  coverage  regardless  of  the  length  of  the  naive  intervals  using  the  normal/normal 
problem  of  Example  2.3.  We  assume  (as  do  Laird  and  Louis  in  their  numerical  work.) 
that  the  prior  mean  q  is  known  and  equal  to  0  w.l.o.g.,  but  that  the  prior  variance  r2  is 

A  P 

unknown.  To  implement  bias  correction  given  T,  we  use  B  =  pl{p  +  If?) 
(Raghunathan,  1987).  This  estimator  of  B  is  smooth  with  distribution  having  support 
(0,1),  unlike  the  MLE,  MVUE,  or  truncated  versions  of  them  proposed  by  Morris 
(1983b)  and  Laird  and  Louis  (1987).  We  then  obtain  <x'(B,yt,  a),  and  compare  intervals 
based  on  this  bias  correction  with  the  naive  EB  interval  (1.3)  and  the  classical  frequentist 
interval  (simply  T,  ±  <tH(a)  in  this  case).  We  took  B  =  .5,  p  =  10,  the  nominal  y  =  .90, 
and  used  5000  replications. 

(Note:  Insert  Figure  3  about  here) 
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Figure  3  shows  the  resulting  simulated  coverage  probability  of  these  three  intervals 
for  dt  conditional  on  _y,.  The  points  plotted  range  from  the  .025  to  the  .975  percentile 
points  of  K,  s  unconditional  distribution,  which  in  this  case  is  iV(0,  2)  .  Note  that  the 
classical  method's  conditional  behavior  is  conservative  for  central  K,  's  but  very  poor  in 
the  tails.  The  unusual  aspect  of  this  example  is  the  pattern  of  lengths  and  conditional 
coverage  of  the  naive  EB  intervals  ~  too  short  and  below  the  nominal  level  in  the  tails 
of  y,'s  distribution,  too  long  and  well  above  the  nominal  level  in  the  middle.  This  is  a 
result  of  the  bias  in  our  estimator  B.  The  conditional  BCN  (CBCN)  method  gives  in¬ 
tervals  that  flatten  out  this  pattern  over  y,  's  distribution.  In  addition,  the  simulated 
CBCN  intervals  were  uniformly  shorter  than  the  inappropriately  centered  naive  ones. 
They  also  had  nearly  constant  average  lengths,  ranging  from  about  82%  as  long  as 
classical  in  the  tails  of  y,'s  distribution  to  about  75%  as  long  as  classical  in  the  middle 
of  the  distribution.  Of  course,  the  fact  that  the  CBCN  method  achieves  conditional  EB 
coverage  over  y,'s  distribution  implies  unconditional  EB  coverage  overall. 

II.  As  a  second  example  of  the  BCN  method,  consider  the  regression  problem  in¬ 
troduced  in  Example  2.4.  For  illustrative  purposes  we  consider  simple  linear  regression 
with  6,  =  (a, ,  p,)T  assuming  p  =  5  simultaneous  regressions,  each  having  only  n,  =  5  ob¬ 
servations.  For  convenience  we  take  the  XtJ  equally  spaced,  centered  and  scaled  for  each 
i  .  Let  both  the  model  variance  a2  and  the  prior  variance  r2  be  known  and  equal  to  I 
w.l.o.g.  Since  in  this  case  a  pivotal  exists  and  a'  is  a  function  only  of  a,  unconditional 
bias  correction  (UBCN)  produces  exactly  unconditional  EB  coverage,  (1.5).  An  exact 
conditional  bias  correction  (CBCN)  given  F,  may  also  be  implemented,  since  by  Remark 
1  and  Example  2.4,  if  we  choose  an  independent,  unbiased  estimate  of  we  can  again 
find  a'  as  a  function  only  of  a. 

(Note:  Insert  Figure  4  about  here) 

Since  our  design  makes  the  slope  /?,  and  the  intercept  a,  independent  in  the  posterior 
family,  we  may  obtain  bias  corrected  intervals  for  them  separately.  Taking  the  true 
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values  of  the  hyperparameters  to  be  p,  =  0  and  =  1,  and  again  using  5000  replications, 

A  J 

Figure  4  shows  simulated  coverage  probabilities  conditional  on  /?,  =  I  Xh  F,y  for  the 
classical,  naive  EB,  LBCN,  and  CBCN  intervals  for  /?,  .  The  points  plotted  cover  +  3 
standard  deviations  of  the  unconditional  distribution  of  /?,,  which  is  A'(  1 ,  2)  .  Again  note 
the  very  poor  conditional  behavior  of  the  classical  method,  and  the  poor  conditional  and 
unconditional  behavior  of  the  naive  method.  Of  course  the  UBCN  method  guarantees 
nominal  unconditional  behavior,  but  also  exhibits  good  conditional  behavior  in  this  case. 
The  CBCN  method's  behavior  is  perfect  as  advertised,  its  curve  being  completely  flat  at 
y  =  .90  to  the  accuracy  of  the  simulation  (standard  error  ~  .004).  The  fact  that  we  as¬ 
sumed  all  variances  known  means  that  all  the  methods  have  a  constant  interval  length. 
In  this  example  the  lengths  are:  classical,  3.29,  naive  EB,  2.33,  UBCN,  2.55,  and  CBCN, 
2.60.  We  can  similarly  exactly  bias  correct  a  simultaneous  EB  confidence  rectangle  for 

A  5  A 

(a]s  /?,)  ,  conditional  on  both  a,  =  I  YJ 5  and  /?„  and  thus  unconditionally. 

f=l 

III.  To  shed  light  on  the  question  raised  in  Section  3  of  a  good  choice  of  marginal 
posterior,  we  return  to  the  exponential/inverse  gamma  case  of  Example  3.2.  We  com- 

A 

pare  the  sensitivity  of  the  achieved  EB  coverage  probabilities  to  the  choice  of  rj  using  the 
Laird  and  Louis  bootstrap,  the  r,  (flat  hyperprior)  matching  bootstrap,  and  the  r2 
matching  bootstrap  methods.  From  the  discussion  in  Example  2.1  if 

A  P 

tjc  =  c/Zlog(7,  +1),  appropriate  choices  for  c  include  p  (MLE),  p-1  (UMVUE);  and  p+  i 

A 

(best  invariant  under  suitable  squared  error  loss).  Choice  of  r\c  affects  the  scale  param¬ 
eter  of  the  sampling  density  for  drawing  the  bootstrap  if  's.  We  ran  a  simulation  of  3000 
replications,  N  =  400  bootstrap  observations  per  replication,  with  rj  =  2,  p  =  5,  and 

A 

nominal  y  =  .95  to  compare  these  three  methods  over  the  three  choices  of  fj.  The  results 
are  summarized  in  Table  I,  which  shows  achieved  EB  coverage  probability,  with  interval 
length  in  parentheses. 


(Note:  Insert  Table  1  about  here) 
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The  Laird  and  Louis  bootstrap  is  extremely  sensitive  to  choice  of  rj,  while  the  r,  match¬ 
ing  bootstrap  is  stable  but  fails  to  achieve  nominal  coverage  probability.  The  r2  match¬ 
ing  bootstrap  is  both  stable  with  respect  to  choice  of  r\  and  achieves  nominal  coverage. 

IV.  Finally,  we  compare  all  the  methods  discussed  in  the  context  of  the 
exponential;  inverse  gamma  problem  of  Examples  2.1  and  3.2.  For  fixed  17  and  p,  we 
generated  9,  's  i.i.d.  as  IG(>/,  1),  and  then  generated  the  Y,  's  independently  as 
Exponential^,),  i  =  1, ...  ,p  .  Each  simulation  is  again  based  on  3000  replications;  for  the 
methods  requiring  a  bootstrap,  we  again  used  N  =  400  bootstrap  trials  per  replication. 

Table  2  shows  lower  endpoint,  upper  endpoint,  interval  length  and  unconditional  EB 
coverage  probability  (all  averaged  over  both  i  and  the  replications)  for  the  classical,  na¬ 
ive  EB,  unconditional  BCN,  Laird  and  Louis  bootstrap,  r,  matching  bootstrap,  and  r2 
matching  bootstrap  methods  for  p  =  5,  true  r\  =  2,5,  and  nominal  individual  coverage 
probabilities  y  =  .90  and  .95.  The  bias  corrected  method  is  affected  by  the  choice  of  r\ 
in  three  places:  in  the  computation  of  the  R  function  (2.3)  (we  need  the  distribution  of 

A 

r)  |  rj),  in  solving  (2.5),  and  in  the  estimated  posterior  distribution.  In  our  simulation,  for 
the  naive  and  bias  corrected  naive  we  show  results  obtained  using  the  marginal 
UMVUE.  Results  (not  shown)  obtained  using  the  marginal  MLE  gave  longer  (i.e.  too 
conservative)  bias  corrected  intervals  (extending  further  to  the  right),  but  shorter  naive 
intervals.  For  the  three  bootstrap  methods,  we  also  used  the  UMVUE  for  tj,  since  from 
Table  1  this  is  the  best  choice  for  the  Laird  and  Louis  method,  the  only  bootstrap  sen¬ 
sitive  to  this  choice.  Recall  also  that  unbiasedness  is  assumed  in  our  rough  argument 
at  the  end  of  Section  3. 

(Note:  Insert  Table  2  about  here) 

Several  points  can  be  made  from  Table  2.  As  expected,  the  classical  intervals 
faithfully  achieve  the  desired  coverages,  but  are  quite  long  compared  to  the  better  EB 
intervals.  The  naive  EB  intervals  fail  to  achieve  nominal  coverage  and  are  very  poor  for 


19 


large  >/  with  our  small  p.  The  bias  corrected  naive  intervals,  on  the  other  hand,  achieve 
the  desired  nominal  coverage  to  the  accuracy  of  the  table  (the  coverage  probabilities 
have  a  standard  error  of  about  .005).  The  Laird  and  Louis  and  t2  matching  bootstrap 
intervals  generally  achieve  the  desired  coverage,  yet  the  latter  are  substantially  shorter. 
The  intervals  based  on  matching  the  flat  hyperprior  r,  are  shifted  to  the  left  of  those 
based  on  r2  and  generally  fail  to  achieve  the  desired  coverage  probability;  apparently  this 
hyperprior  is  putting  too  much  weight  on  large  values  of  rj. 

5,  CONCLUSION 

In  this  paper  we  have  developed  a  general  method  to  conditionally  correct  the  bias 
in  naive  empirical  Bayes  confidence  intervals.  We  have  also  attempted  to  clarify  and 
expand  on  the  idea  of  using  bootstrap  observations  to  accomplish  a  marginal  posterior 
Bayes  solution.  We  conclude  that  the  bias  correction  method  is  attractive  due  to  its 
general  applicability,  straightforward  implementation,  and  direct  attack  on  the  deficien¬ 
cies  of  the  naive  EB  interval.  The  marginal  posterior  approach  can  also  be  quite  suc¬ 
cessful  although  the  choice  of  a  good  mixing  distribution  h  (equivalently,  a  good 
hyperprior  r)  is  critical  and  might  require  preliminary  investigation.  Furthermore,  im¬ 
plementation  of  this  approach  for  a  given  t  in  the  absence  of  a  closed  form  for  p,  the 
sampling  density  of  r\  ,  is  not  clear. 
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TABLE  1.  Comparison  of  Marginal  Posterior  Methods 


Estimator  of  rj 

Laird 
&  Louis 

ri 

Matching 

*2 

Matching 

y  =  .95 

L'MVL'E 

.954 

(7.50) 

.930 

(4.51) 

.951 

(5.66) 

ML.E 

.931 

(4.50) 

.928 

(4.33) 

.950 

(5.61) 

Best  Invariant 

.865 

(2.76) 

.926 

(4.14) 

.948 

(5.40) 

TABLE  2:  Comparison  of  Bias  Corrected  and  Marginal  Posterior  Methods,  p  =  5 


Interval 

Method 

Average 

Lower 

Endpoint 

Average 

Upper 

Endpoint 

Average 

Interval 

Length 

Average 

Uncond'l 

Cov.  Prob. 

■cs 

^  II 

II 

\D 

o 

Classical 

.335 

19.5 

19.2 

.901 

Naive  EB 

.355 

3.87 

3.51 

.839 

Bias  Corrected 

.331 

4.74 

4.41 

.897 

Laird  and  Louis 

.339 

5.15 

4.81 

.904 

t,  Matching 

.287 

3.23 

2.95 

.868 

r2  Matching 

.311 

4.00 

3.69 

.894 

y  =  .95 

Classical 

.268 

39.1 

38.8 

.952 

Naive  EB 

.306 

5.53 

5.22 

.900 

Bias  Corrected 

.285 

7.84 

7.55 

.952 

Laird  and  Louis 

.283 

7.79 

7.50 

.954 

t,  Matching 

.246 

4.46 

4.51 

.930 

tj  Matching 

.265 

5.93 

5.66 

.951 

V  =  5 
y  =  .90 

Classical 

4.89 

4.81 

.899 

Naive  EB 

.690 

556 

.771 

Bias  Corrected 

.116 

1.03 

.914 

.902 

Laird  and  Louis 

.114 

1.04 

.928 

.899 

t.  Matching 

.092 

.620 

.528 

.863 

r:  Matching 

.102 

.810 

.708 

.901 

y  =  .95 

Classical 

.068 

9.87 

9.81 

.948 

Naive  EB 

.120 

.859 

.739 

.846 

Bias  Corrected 

.103 

1.67 

1.57 

.956 

Laird  and  Louis 

.096 

1.41 

1.31 

.951 

t,  Matching 

.081 

.816 

.735 

.918 

tj  Matching 

.089 

1. 10 

1.01 

.947 

Figure  2.  Generic  Illustration 


Figure  3.  Conditional  Coverage  Probabilities,  Normal/Normal  Case,  Unknown  Prior  Variance 
EB  Confidence  Interval  for  o,  at  Nominal  v  =  .9  (true  B  =  .5) 


CLASSICAL 


Figure  4.  Conditional  Coverage  Probabilities,  Simple  Linear  Regression  Case 
Individual  EB  Confidence  Interval  for  g,  at  Nominal  y  =  .9 
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